Setup

## **** Utilized Cores **** = 12
## $subsetGenes
## [1] "protein_coding"
## 
## $subsetCells
## [1] "F"
## 
## $resolution
## [1] "0.4"
## 
## $resultsPath
## [1] "/sc/orga/projects/pd-omics/brian/PD_scRNAseq/Results/subsetGenes-protein_coding__subsetCells-F__Resolution-0.4"
## 
## $interactive
## [1] "F"

** /sc/orga/projects/pd-omics/brian/PD_scRNAseq/Results/subsetGenes-protein_coding__subsetCells-F__Resolution-0.4 **

Load Libraries & Report Versions

## Error in library(ulimit): there is no package called 'ulimit'
## Error in library(benchmarkme): there is no package called 'benchmarkme'
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS release 6.9 (Final)
## 
## Matrix products: default
## BLAS/LAPACK: /hpc/packages/minerva-common/intel/parallel_studio_xe_2018/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
## 
## locale:
## [1] C
## 
## attached base packages:
##  [1] parallel  stats4    grid      stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] DESeq2_1.20.0               SummarizedExperiment_1.10.1
##  [3] DelayedArray_0.6.1          BiocParallel_1.14.2        
##  [5] matrixStats_0.53.1          Biobase_2.40.0             
##  [7] GenomicRanges_1.32.4        GenomeInfoDb_1.16.0        
##  [9] IRanges_2.14.10             S4Vectors_0.18.3           
## [11] BiocGenerics_0.26.0         biomaRt_2.36.1             
## [13] ComplexHeatmap_1.18.0       DT_0.4                     
## [15] ggrepel_0.8.0               shiny_1.1.0                
## [17] reshape2_1.4.3              viridis_0.5.1              
## [19] viridisLite_0.3.0           plotly_4.7.1               
## [21] knitr_1.20                  gridExtra_2.3              
## [23] dplyr_0.7.6                 Seurat_2.3.4               
## [25] Matrix_1.2-14               cowplot_0.9.3              
## [27] ggplot2_3.0.0              
## 
## loaded via a namespace (and not attached):
##   [1] snow_0.4-3             backports_1.1.2        circlize_0.4.4        
##   [4] Hmisc_4.1-1            plyr_1.8.4             igraph_1.2.1          
##   [7] lazyeval_0.2.1         splines_3.5.1          digest_0.6.17         
##  [10] foreach_1.4.4          htmltools_0.3.6        lars_1.2              
##  [13] gdata_2.18.0           magrittr_1.5           checkmate_1.8.5       
##  [16] memoise_1.1.0          cluster_2.0.7-1        mixtools_1.1.0        
##  [19] ROCR_1.0-7             annotate_1.58.0        R.utils_2.7.0         
##  [22] prettyunits_1.0.2      colorspace_1.3-2       blob_1.1.1            
##  [25] crayon_1.3.4           RCurl_1.95-4.11        jsonlite_1.5          
##  [28] genefilter_1.62.0      bindr_0.1.1            survival_2.42-6       
##  [31] zoo_1.8-3              iterators_1.0.10       ape_5.1               
##  [34] glue_1.3.0             gtable_0.2.0           zlibbioc_1.26.0       
##  [37] XVector_0.20.0         GetoptLong_0.1.7       kernlab_0.9-26        
##  [40] shape_1.4.4            prabclus_2.2-6         DEoptimR_1.0-8        
##  [43] scales_0.5.0           mvtnorm_1.0-8          DBI_1.0.0             
##  [46] Rcpp_1.0.0             metap_0.9              dtw_1.20-1            
##  [49] xtable_1.8-2           progress_1.2.0         htmlTable_1.12        
##  [52] reticulate_1.7         foreign_0.8-70         bit_1.1-14            
##  [55] proxy_0.4-22           mclust_5.4.1           SDMTools_1.1-221      
##  [58] Formula_1.2-3          tsne_0.1-3             htmlwidgets_1.2       
##  [61] httr_1.3.1             gplots_3.0.1           RColorBrewer_1.1-2    
##  [64] fpc_2.1-11             acepack_1.4.1          modeltools_0.2-22     
##  [67] ica_1.0-2              pkgconfig_2.0.2        XML_3.98-1.12         
##  [70] R.methodsS3_1.7.1      flexmix_2.3-14         nnet_7.3-12           
##  [73] locfit_1.5-9.1         tidyselect_0.2.4       rlang_0.2.1           
##  [76] later_0.7.3            AnnotationDbi_1.42.1   munsell_0.5.0         
##  [79] tools_3.5.1            RSQLite_2.1.1          ggridges_0.5.0        
##  [82] evaluate_0.11          stringr_1.3.1          yaml_2.1.19           
##  [85] bit64_0.9-7            fitdistrplus_1.0-9     robustbase_0.93-1.1   
##  [88] caTools_1.17.1         purrr_0.2.5            RANN_2.6              
##  [91] bindrcpp_0.2.2         pbapply_1.3-4          nlme_3.1-137          
##  [94] mime_0.5               R.oo_1.22.0            hdf5r_1.0.0           
##  [97] compiler_3.5.1         rstudioapi_0.7         png_0.1-7             
## [100] geneplotter_1.58.0     tibble_1.4.2           stringi_1.2.4         
## [103] lattice_0.20-35        trimcluster_0.1-2.1    pillar_1.3.0          
## [106] lmtest_0.9-36          GlobalOptions_0.1.0    data.table_1.11.8     
## [109] bitops_1.0-6           irlba_2.3.2            httpuv_1.4.5          
## [112] R6_2.3.0               latticeExtra_0.6-28    promises_1.0.1        
## [115] KernSmooth_2.23-15     codetools_0.2-15       MASS_7.3-50           
## [118] gtools_3.8.1           assertthat_0.2.0       rprojroot_1.3-2       
## [121] rjson_0.2.20           withr_2.1.2            GenomeInfoDbData_1.1.0
## [124] diptest_0.75-7         doSNOW_1.0.16          hms_0.4.2             
## [127] rpart_4.1-13           tidyr_0.8.1            class_7.3-14          
## [130] rmarkdown_1.10         segmented_0.5-3.0      Rtsne_0.13            
## [133] base64enc_0.1-3
## [1] "Seurat  2.3.4"

Raise Memory Limit

Rstudio has a default memory limit of only 1GB. To override this, detect the true memory available and set a new limit.

## Error in loadNamespace(name): there is no package called 'benchmarkme'
## Error in strsplit(RAM, " "): object 'RAM' not found
## Error in eval(quote(list(...)), env): object 'RAM' not found
## Error in loadNamespace(name): there is no package called 'ulimit'

Clean Metadata

Add Metadata

## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
## Warning in as.list.environment(environment(), all = TRUE): NAs introduced
## by coercion
## Error in 0:subsetCells: NA/NaN argument

Filter & Normalize Data

Subset Genes by Biotype

Include only subsets of genes by type. Biotypes from: https://useast.ensembl.org/info/genome/genebuild/biotypes.html

subsetBiotypes <- function(pbmc, subsetGenes){
  if( subsetGenes!=F ){
    cat(paste("Subsetting genes:",subsetGenes))
    # If the gene_biotypes file exists, import csv. Otherwise, get from biomaRt
    if(file_test("-f", file.path(root,"Data/gene_biotypes.csv"))){
      biotypes <- read.csv(file.path(root,"Data/gene_biotypes.csv"))
    }
    else {
      ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org",
                       dataset="hsapiens_gene_ensembl") 
      ensembl <- useDataset(mart = ensembl, dataset = "hsapiens_gene_ensembl")
      listFilters(ensembl)
      listAttributes(ensembl)   
      biotypes <- getBM(attributes=c("hgnc_symbol", "gene_biotype"), filters="hgnc_symbol",
            values=row.names(pbmc@data), mart=ensembl) 
      write.csv(biotypes, file.path(root,"Data/gene_biotypes.csv"), quote=F, row.names=F)
    } 
    # Subset data by creating new Seurat object (annoying but necessary)
    geneSubset <- biotypes[biotypes$gene_biotype==subsetGenes,"hgnc_symbol"] 
    
    cat(paste(dim(pbmc@raw.data[geneSubset, ])[1],"/", dim(pbmc@raw.data)[1], 
                "genes are", subsetGenes))
    # Add back into pbmc 
    subset.matrix <- pbmc@raw.data[geneSubset, ] # Pull the raw expression matrix from the original Seurat object containing only the genes of interest
    pbmc_sub <- CreateSeuratObject(subset.matrix) # Create a new Seurat object with just the genes of interest
    orig.ident <- row.names(pbmc@meta.data) # Pull the identities from the original Seurat object as a data.frame
    pbmc_sub <- AddMetaData(object = pbmc_sub, metadata = pbmc@meta.data) # Add the idents to the meta.data slot
    pbmc_sub <- SetAllIdent(object = pbmc_sub, id = "ident") # Assign identities for the new Seurat object
    pbmc <- pbmc_sub
    rm(list = c("pbmc_sub","geneSubset", "subset.matrix", "orig.ident")) 
  } 
}

subsetBiotypes(pbmc, subsetGenes)
## Subsetting genes: protein_coding14827 / 24914 genes are protein_coding

Subset Genes by Variance

** Important!**: * In ScaleData… + Specify do.par = F (unless you have parallel processing set up properly, this will cause your script to crash) + Specify num.cores = nCores (to use all available cores, determined by parallel::detectCores())

Regress out: number of unique transcripts (nUMI), % mitochondrial transcripts (percent.mito)

## Regressing out: nUMI, percent.mito
## Warning in RegressOutResid(object = object, vars.to.regress =
## vars.to.regress, : For parallel processing, please set do.par to TRUE.
## 
## Time Elapsed:  23.1420209407806 secs
## Scaling data matrix

Filtered Dimensions

## An object of class seurat in project RAJ_13357 
##  24914 genes across 19144 samples.

Dimensionality Reduction

PCA

ProjectPCA scores each gene in the dataset (including genes not included in the PCA) based on their correlation with the calculated components. Though we don’t use this further here, it can be used to identify markers that are strongly correlated with cellular heterogeneity, but may not have passed through variable gene selection. The results of the projected PCA can be explored by setting use.full=T in the functions above

  • Other Dim Reduction Methods in Seurat
    • RunCCA()
    • RunMultiCCA()
    • RunDiffusion()
    • RunPHATE()
    • RunICA()
## Working dimension size 27
## Initializing starting vector v with samples from standard normal distribution.
## Use `set.seed` first for reproducibility.
## irlba: using fast C implementation

Significant PCs

Determine statistically significant PCs for further analysis. NOTE: This process can take a long time for big datasets, comment out for expediency. More approximate techniques such as those implemented in PCElbowPlot() can be used to reduce computation time

Find Cell Clusters

We first construct a KNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard similarity). To cluster the cells, we apply modularity optimization techniques such as the Louvain algorithm (default) or SLM [SLM, Blondel et al., Journal of Statistical Mechanics], to iteratively group cells together, with the goal of optimizing the standard modularity function.

On Resolution
NA

  • To further increase speed, you can employ an approximate nearest neighbor search via the RANN package by increasing the nn.eps + parameter. Setting this at 0 (the default) represents an exact neighbor search.
  • By default, we perform 100 random starts for clustering and select the result with highest modularity. You can lower this through the n.start parameter to reduce clustering time.
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 19144
## Number of edges: 769919
## 
## Running Louvain algorithm...
## Random start: 1
## Iteration: 1
## Modularity: 0.7613
## Iteration: 2
## Modularity: 0.7681
## Iteration: 3
## Modularity: 0.7681
## 
## Random start: 2
## Iteration: 1
## Modularity: 0.7540
## Iteration: 2
## Modularity: 0.7699
## Iteration: 3
## Modularity: 0.7699
## 
## Random start: 3
## Iteration: 1
## Modularity: 0.7661
## Iteration: 2
## Modularity: 0.7707
## Iteration: 3
## Modularity: 0.7707
## 
## Random start: 4
## Iteration: 1
## Modularity: 0.7603
## Iteration: 2
## Modularity: 0.7683
## Iteration: 3
## Modularity: 0.7683
## 
## Random start: 5
## Iteration: 1
## Modularity: 0.7594
## Iteration: 2
## Modularity: 0.7664
## Iteration: 3
## Modularity: 0.7664
## 
## Random start: 6
## Iteration: 1
## Modularity: 0.7620
## Iteration: 2
## Modularity: 0.7695
## Iteration: 3
## Modularity: 0.7701
## Iteration: 4
## Modularity: 0.7701
## 
## Random start: 7
## Iteration: 1
## Modularity: 0.7602
## Iteration: 2
## Modularity: 0.7684
## Iteration: 3
## Modularity: 0.7684
## 
## Random start: 8
## Iteration: 1
## Modularity: 0.7598
## Iteration: 2
## Modularity: 0.7652
## Iteration: 3
## Modularity: 0.7652
## 
## Random start: 9
## Iteration: 1
## Modularity: 0.7621
## Iteration: 2
## Modularity: 0.7679
## Iteration: 3
## Modularity: 0.7679
## 
## Random start: 10
## Iteration: 1
## Modularity: 0.7530
## Iteration: 2
## Modularity: 0.7702
## Iteration: 3
## Modularity: 0.7702
## 
## Maximum modularity in 10 random starts: 0.7707
## Number of communities: 6
## Elapsed time: 9 seconds
## Parameters used in latest FindClusters calculation run on: 2019-01-16 13:42:52
## =============================================================================
## Resolution: 0.4
## -----------------------------------------------------------------------------
## Modularity Function    Algorithm         n.start         n.iter
##      1                   1                 10              10
## -----------------------------------------------------------------------------
## Reduction used          k.param          prune.SNN
##      pca                 30                0.0667
## -----------------------------------------------------------------------------
## Dims used in calculation
## =============================================================================
## 1 2 3 4 5 6 7 8 9 10

t-SNE

As input to the tSNE, we suggest using the same PCs as input to the clustering analysis, although computing the tSNE based on scaled gene expression is also supported using the genes.use argument.

** Important!**: Specify num_threads=0 in ‘RunTSNE’ to use all available cores.

“FItSNE”, a new fast implementation of t-SNE, is also available through RunTSNE. However FItSNE must first be setup on your computer.

## Read the 19144 x 10 data matrix successfully!
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Normalizing input...
## Building tree...
##  - point 0 of 19144
##  - point 10000 of 19144
## Done in 9.44 seconds (sparsity = 0.006869)!
## Learning embedding...
## Iteration 50: error is 104.835938 (50 iterations in 19.62 seconds)
## Iteration 100: error is 104.709901 (50 iterations in 25.21 seconds)
## Iteration 150: error is 101.632348 (50 iterations in 21.25 seconds)
## Iteration 200: error is 101.580125 (50 iterations in 22.61 seconds)
## Iteration 250: error is 101.472564 (50 iterations in 20.42 seconds)
## Iteration 300: error is 4.413932 (50 iterations in 13.54 seconds)
## Iteration 350: error is 4.166479 (50 iterations in 14.89 seconds)
## Iteration 400: error is 4.016408 (50 iterations in 15.17 seconds)
## Iteration 450: error is 3.905556 (50 iterations in 15.98 seconds)
## Iteration 500: error is 3.818781 (50 iterations in 16.77 seconds)
## Iteration 550: error is 3.750168 (50 iterations in 16.79 seconds)
## Iteration 600: error is 3.693278 (50 iterations in 16.65 seconds)
## Iteration 650: error is 3.645445 (50 iterations in 16.63 seconds)
## Iteration 700: error is 3.604156 (50 iterations in 16.93 seconds)
## Iteration 750: error is 3.568189 (50 iterations in 16.37 seconds)
## Iteration 800: error is 3.536739 (50 iterations in 17.20 seconds)
## Iteration 850: error is 3.509031 (50 iterations in 17.36 seconds)
## Iteration 900: error is 3.484408 (50 iterations in 17.27 seconds)
## Iteration 950: error is 3.462351 (50 iterations in 17.38 seconds)
## Iteration 1000: error is 3.443001 (50 iterations in 17.65 seconds)
## Fitting performed in 355.69 seconds.

t-SNE + Metadata Plots

tSNE Disease

## t-SNE Metadata plot for  dx

Mutations

## t-SNE Metadata plot for  mut

Gender

## t-SNE Metadata plot for  Gender

Age

## t-SNE Metadata plot for  Age

### Individual ID

## t-SNE Metadata plot for  ID

Cluster Biomarkers

NA

Shown here: Biomarkers of each cluster vs. all other clusters.

Biomarkers Data

All Biomarkers

Top Biomarkers

Cluster Biomarker: Violin Plots

Cluster 0

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster Biomarker: Volcano Plots

## 
## ### Cluster  0 : Volcano

## 
## 
## ### Cluster  1 : Volcano

## 
## 
## ### Cluster  2 : Volcano
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_text_repel).

## 
## 
## ### Cluster  3 : Volcano
## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_text_repel).

## 
## 
## ### Cluster  4 : Volcano
## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_text_repel).

## 
## 
## ### Cluster  5 : Volcano
## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_text_repel).

Map Clusters to Known Biomarkers

  • Known Monocytes Biomarkers
  • Classical: CD14++ / CD16–
  • Intermediate: CD14++ / CD16+
  • Nonclassical: CD14+ / CD16++ (not captured in this data)

The following plots show the absolute expression of each biomarker, as opposed to avg_logFC which is dependent on the expression patterns of other cell types being compared.

Markers Dataframe

## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

Defining Cell-types

Identify Cell Types By DEGs

  • Classification Approach 1
    • Go through each cluster and see what biomarkers they have through DGE.
    • If a cluster has a known gene marker within their top N biomarkers (with the right valence +/-), classify all cells within that cluster as a given cell type.
    • Drawback: The biomarkers that DGE identifies for each cluster will depend on the composition of the other cells. So some clusters may remain unidentified simply by the fact that they didn’t have the proper cells to be compared to.
## t-SNE Metadata plot for  CellType_DGE

Identify Cell Types By Average Expression

  • Classification Approach 2
    • Alternatively, you could classify cells by their absolute expression values. E.g. If any cell expresses CD16 more than the average expression of CD16 across all cells, then classify it as CD16+.
    • Drawback: The threshold for classifying a cell as either CD16+ or CD16– is defined rather arbitrarily (the average expression of CD16 in this sample isn’t necessarily biologically meaningful).

## t-SNE Metadata plot for  CellType_AvgExp

Known Biomarkers: Heatmaps

Cells Separated

markerDF <- get_markerDF(pbmc, markerList, 
             meta_vars =c("barcode", "dx", "mut","ID","post_clustering", "percent.mito","nGene", "nUMI",
                          "CellType_DGE","CellType_AvgExp"))
Spectral <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(length(unique(pbmc@meta.data$mut)), "Spectral"))

if (interactive==T){
  # Spectral <- heatmaply::Spectral(length(unique(pbmc@meta.data$mut)))
  markerMelt <- reshape2::acast(markerDF, Cell~Gene, value.var="Expression", fun.aggregate = mean, drop = F, fill = 0) 
  heatmaply::heatmaply(markerMelt,  key.title="Expression",#plot_method= "ggplot",
        k_row = dim(markerMelt)[2], dendrogram = "row",
        showticklabels = c(T, F), xlab = "Known Markers", ylab = "Cells", column_text_angle = 45, 
        row_side_colors =  pbmc@meta.data[,c("dx","mut", "CellType_DGE")], row_side_palette = Spectral
        )  %>%  colorbar(tickfont = list(size = 12), titlefont = list(size = 14), which = 2)  %>% 
        colorbar(tickfont = list(size = 12), titlefont = list(size = 14), which = 1)
}else{ 
  # markerDF_sub <-subset(markerDF, Gene==markerList[1])  
  # var_to_colors(markerDF_sub, "post_clustering")  
  # library(pheatmap)
  # pheatmap(markerMelt, annotation_row = markerDF_sub[c("dx","mut","CellType_DGE", "CellType_AvgExp")])
  # pheatmap(markerMelt, kmeans_k = NA, annotation_row = markerDF_sub[c("dx","mut","CellType_DGE", "CellType_AvgExp")],
  #         cluster_cols = F, cutree_rows = length(unique(markerDF$post_clustering)),  angle_col=45 )
  library(RColorBrewer) 
  var_to_colors <- function(markerDF, metaVar){
    colors <- brewer.pal(length(unique(markerDF[metaVar]) ), "Dark2")
     sample(colors, length(unique(markerDF[metaVar])), replace = TRUE, prob = NULL)
    # metaColors <- colors[ subset(markerDF, Gene==markerList[1])[metaVar][,1] %>% as.factor() ]
    return(metaColors)
  }  
  # library(GMD)
  # myCols = cbind(var_to_colors(markerDF, "dx"), var_to_colors(markerDF, "mut")) 
  # rlab=t(cbind(
  #   var_to_colors(markerDF, "post_clustering"),
  #   var_to_colors(markerDF, "dx")
  #   ))
  #   heatmap.2(marker.matrix, key.title="Expression",  col = viridis(300), trace="none",Colv = F, Rowv = F,
  #             labRow = F, xlab = "Biomarker", ylab="Cell", cexCol=1, RowSideColors = var_to_colors(markerDF, "post_clustering")
  #             )
  # heatmap.3(markerMelt, dendrogram = 'row', kr = length(unique(markerDF)), labRow = F, 
  #           xlab = "Biomarker", ylab = "Cell", RowSideColors = rlab, RowSideColorsSize=2 )
   
  
  
  markerDF <- markerDF %>%    
    mutate_at(vars(post_clustering, dx, mut, ID, CellType_DGE, CellType_AvgExp), as.factor) %>% 
    mutate(Cluster = post_clustering) %>%
    arrange(post_clustering) 
  # markerMelt <- reshape2::acast(markerDF, Cell~Gene, value.var="Expression", fun.aggregate = mean, drop = F, fill = 0) 
  markerMelt <- dcast(markerDF,  Cell + post_clustering + dx + mut + ID + CellType_DGE + CellType_AvgExp ~ Gene,
                      fun.aggregate = mean, value.var = "Expression") %>% arrange(post_clustering)
  marker.matrix <- markerMelt[markerList] %>%as.matrix()
  row.names(marker.matrix) <- markerMelt$Cell
  
  ha = HeatmapAnnotation(df = markerDF[c("dx","mut","ID","CellType_DGE","CellType_AvgExp")], which = "row") 
  
  ComplexHeatmap::Heatmap(marker.matrix, col=viridis(300), column_title = "Biomarker", row_title = "Cell",  
                          row_dend_reorder = F,show_row_names = F, show_column_dend = F,show_row_dend =T,
                          cluster_rows = T, column_title_side = "bottom",km = length(unique(markerMelt$post_clustering))) + ha
 

} 
## Error in `[.data.frame`(markerMelt, markerList): undefined columns selected

DGE: All Cells

  • DGE methods available in Seurat include:
  • DESeq2DETest()
  • DiffExpTest()
  • DiffTTest()

PD vs. Controls

## Error in FindMarkers(pbmc, ident.1 = group1, ident.2 = group2, test.use = test.use): No genes pass logfc.threshold threshold

LRRK vs. PD

## Error in FindMarkers(pbmc, ident.1 = group1, ident.2 = group2, test.use = test.use): No genes pass logfc.threshold threshold

CellType_DGE

CellType_DGE

## Error in WhichCells(object = object, ident = ident.1): Identity : CD14++/CD16+ not found.

CellType_AvgExp

## Error in FindMarkers(pbmc, ident.1 = group1, ident.2 = group2, test.use = test.use): No genes pass logfc.threshold threshold

DGE: Within Clusters

Between Disease Groups

## 
## ###  Cluster 0: 
##  PD vs. control
## Error in FindMarkers(pbmc, ident.1 = group1, ident.2 = group2, test.use = test.use): No genes pass logfc.threshold threshold

Between Mutation Groups

## 
## ###  Cluster 0: 
##  LRRK2 vs. PD
## Error in FindMarkers(pbmc, ident.1 = group1, ident.2 = group2, test.use = test.use): No genes pass logfc.threshold threshold

Between Mutation Groups

## 
## ###  Cluster 0: 
##  CD14++/CD16+ vs. CD14++/CD16--
## Error in WhichCells(object = object, ident = ident.1): Identity : CD14++/CD16+ not found.
## 
## ###  Cluster 0: 
##  CD14++/CD16+ vs. CD14++/CD16--
## Error in FindMarkers(pbmc, ident.1 = group1, ident.2 = group2, test.use = test.use): No genes pass logfc.threshold threshold

Try Different Cluster Resolutions

If you perturb some of our parameter choices above (for example, setting resolution=0.8 or changing the number of PCs), you might see the CD4 T cells subdivide into two groups. You can explore this subdivision to find markers separating the two T cell subsets. However, before reclustering (which will overwrite object@ident), we can stash our renamed identities to be easily recovered later.